Overview

Dataset statistics

Number of variables19
Number of observations844392
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory120.0 MiB
Average record size in memory149.0 B

Variable types

Numeric10
Categorical9

Alerts

Promo2SinceWeek is highly correlated with Promo2 and 2 other fieldsHigh correlation
Promo2SinceYear is highly correlated with Promo2 and 2 other fieldsHigh correlation
Month is highly correlated with SchoolHoliday and 2 other fieldsHigh correlation
WeekOfYear is highly correlated with Season and 1 other fieldsHigh correlation
StoreType is highly correlated with AssortmentHigh correlation
Assortment is highly correlated with StoreTypeHigh correlation
Promo2 is highly correlated with Promo2SinceWeek and 2 other fieldsHigh correlation
PromoInterval is highly correlated with Promo2 and 2 other fieldsHigh correlation
Season is highly correlated with Month and 1 other fieldsHigh correlation
SchoolHoliday is highly correlated with MonthHigh correlation

Reproduction

Analysis started2022-11-28 17:13:02.312789
Analysis finished2022-11-28 17:14:06.114809
Duration1 minute and 3.8 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

Store
Real number (ℝ≥0)

Distinct1115
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean558.4229197
Minimum1
Maximum1115
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2022-11-28T17:14:06.151381image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile56
Q1280
median558
Q3837
95-th percentile1060
Maximum1115
Range1114
Interquartile range (IQR)557

Descriptive statistics

Standard deviation321.7319144
Coefficient of variation (CV)0.5761438205
Kurtosis-1.198841375
Mean558.4229197
Median Absolute Deviation (MAD)278
Skewness0.0004137544558
Sum471527846
Variance103511.4247
MonotonicityIncreasing
2022-11-28T17:14:06.195869image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
769942
 
0.1%
1097942
 
0.1%
85942
 
0.1%
562942
 
0.1%
262942
 
0.1%
733942
 
0.1%
494942
 
0.1%
682942
 
0.1%
335942
 
0.1%
423942
 
0.1%
Other values (1105)834972
98.9%
ValueCountFrequency (%)
1781
0.1%
2784
0.1%
3779
0.1%
4784
0.1%
5779
0.1%
6780
0.1%
7786
0.1%
8784
0.1%
9779
0.1%
10784
0.1%
ValueCountFrequency (%)
1115781
0.1%
1114784
0.1%
1113784
0.1%
1112779
0.1%
1111779
0.1%
1110783
0.1%
1109622
0.1%
1108780
0.1%
1107623
0.1%
1106784
0.1%

DayOfWeek
Real number (ℝ≥0)

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.520361396
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2022-11-28T17:14:06.228896image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile6
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.723689221
Coefficient of variation (CV)0.4896341674
Kurtosis-1.259310053
Mean3.520361396
Median Absolute Deviation (MAD)2
Skewness0.01929954019
Sum2972565
Variance2.971104532
MonotonicityNot monotonic
2022-11-28T17:14:06.259023image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
6144058
17.1%
2143961
17.0%
3141936
16.8%
5138640
16.4%
1137560
16.3%
4134644
15.9%
73593
 
0.4%
ValueCountFrequency (%)
1137560
16.3%
2143961
17.0%
3141936
16.8%
4134644
15.9%
5138640
16.4%
6144058
17.1%
73593
 
0.4%
ValueCountFrequency (%)
73593
 
0.4%
6144058
17.1%
5138640
16.4%
4134644
15.9%
3141936
16.8%
2143961
17.0%
1137560
16.3%

Sales
Real number (ℝ≥0)

Distinct21734
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6955.514291
Minimum0
Maximum41551
Zeros54
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2022-11-28T17:14:06.298945image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3173
Q14859
median6369
Q38360
95-th percentile12668
Maximum41551
Range41551
Interquartile range (IQR)3501

Descriptive statistics

Standard deviation3104.21468
Coefficient of variation (CV)0.4462954931
Kurtosis4.852011543
Mean6955.514291
Median Absolute Deviation (MAD)1694
Skewness1.593922039
Sum5873180623
Variance9636148.782
MonotonicityNot monotonic
2022-11-28T17:14:06.341870image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5674215
 
< 0.1%
5558197
 
< 0.1%
5483196
 
< 0.1%
6214195
 
< 0.1%
6049195
 
< 0.1%
5723194
 
< 0.1%
5449192
 
< 0.1%
5140191
 
< 0.1%
5489191
 
< 0.1%
5041190
 
< 0.1%
Other values (21724)842436
99.8%
ValueCountFrequency (%)
054
< 0.1%
461
 
< 0.1%
1241
 
< 0.1%
1331
 
< 0.1%
2861
 
< 0.1%
2971
 
< 0.1%
3161
 
< 0.1%
4161
 
< 0.1%
5061
 
< 0.1%
5201
 
< 0.1%
ValueCountFrequency (%)
415511
< 0.1%
387221
< 0.1%
384841
< 0.1%
383671
< 0.1%
380371
< 0.1%
380251
< 0.1%
376461
< 0.1%
374031
< 0.1%
373761
< 0.1%
371221
< 0.1%

Promo
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
0
467496 
1
376896 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters844392
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0467496
55.4%
1376896
44.6%

Length

2022-11-28T17:14:06.378999image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-28T17:14:06.422366image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0467496
55.4%
1376896
44.6%

Most occurring characters

ValueCountFrequency (%)
0467496
55.4%
1376896
44.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number844392
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0467496
55.4%
1376896
44.6%

Most occurring scripts

ValueCountFrequency (%)
Common844392
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0467496
55.4%
1376896
44.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII844392
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0467496
55.4%
1376896
44.6%

SchoolHoliday
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
0
680935 
1
163457 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters844392
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0680935
80.6%
1163457
 
19.4%

Length

2022-11-28T17:14:06.453851image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-28T17:14:06.489561image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0680935
80.6%
1163457
 
19.4%

Most occurring characters

ValueCountFrequency (%)
0680935
80.6%
1163457
 
19.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number844392
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0680935
80.6%
1163457
 
19.4%

Most occurring scripts

ValueCountFrequency (%)
Common844392
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0680935
80.6%
1163457
 
19.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII844392
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0680935
80.6%
1163457
 
19.4%

StoreType
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
1
457077 
4
258774 
3
112978 
2
 
15563

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters844392
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row3
3rd row3
4th row3
5th row3

Common Values

ValueCountFrequency (%)
1457077
54.1%
4258774
30.6%
3112978
 
13.4%
215563
 
1.8%

Length

2022-11-28T17:14:06.520785image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-28T17:14:06.559042image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
1457077
54.1%
4258774
30.6%
3112978
 
13.4%
215563
 
1.8%

Most occurring characters

ValueCountFrequency (%)
1457077
54.1%
4258774
30.6%
3112978
 
13.4%
215563
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number844392
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1457077
54.1%
4258774
30.6%
3112978
 
13.4%
215563
 
1.8%

Most occurring scripts

ValueCountFrequency (%)
Common844392
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1457077
54.1%
4258774
30.6%
3112978
 
13.4%
215563
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII844392
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1457077
54.1%
4258774
30.6%
3112978
 
13.4%
215563
 
1.8%

Assortment
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
1
444909 
3
391271 
2
 
8212

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters844392
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1444909
52.7%
3391271
46.3%
28212
 
1.0%

Length

2022-11-28T17:14:06.591609image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-28T17:14:06.630236image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
1444909
52.7%
3391271
46.3%
28212
 
1.0%

Most occurring characters

ValueCountFrequency (%)
1444909
52.7%
3391271
46.3%
28212
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number844392
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1444909
52.7%
3391271
46.3%
28212
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Common844392
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1444909
52.7%
3391271
46.3%
28212
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII844392
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1444909
52.7%
3391271
46.3%
28212
 
1.0%

CompetitionDistance
Real number (ℝ≥0)

Distinct655
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5457.842215
Minimum20
Maximum75860
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2022-11-28T17:14:06.667014image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile130
Q1710
median2330
Q36880
95-th percentile20390
Maximum75860
Range75840
Interquartile range (IQR)6170

Descriptive statistics

Standard deviation7799.322503
Coefficient of variation (CV)1.429012089
Kurtosis13.45618862
Mean5457.842215
Median Absolute Deviation (MAD)1980
Skewness2.97902141
Sum4608558304
Variance60829431.51
MonotonicityNot monotonic
2022-11-28T17:14:06.711220image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2509210
 
1.1%
506249
 
0.7%
3506239
 
0.7%
12006072
 
0.7%
1906066
 
0.7%
905609
 
0.7%
1805422
 
0.6%
1505294
 
0.6%
3305294
 
0.6%
1404684
 
0.6%
Other values (645)784253
92.9%
ValueCountFrequency (%)
20779
 
0.1%
303116
0.4%
403890
0.5%
506249
0.7%
602342
 
0.3%
703736
0.4%
802331
 
0.3%
905609
0.7%
1003900
0.5%
1104516
0.5%
ValueCountFrequency (%)
75860887
0.1%
58260885
0.1%
48330784
0.1%
46590784
0.1%
45740780
0.1%
44320780
0.1%
40860881
0.1%
40540780
0.1%
38710784
0.1%
38630882
0.1%

Promo2
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
0
423307 
1
421085 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters844392
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0423307
50.1%
1421085
49.9%

Length

2022-11-28T17:14:06.749955image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-28T17:14:06.785545image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0423307
50.1%
1421085
49.9%

Most occurring characters

ValueCountFrequency (%)
0423307
50.1%
1421085
49.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number844392
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0423307
50.1%
1421085
49.9%

Most occurring scripts

ValueCountFrequency (%)
Common844392
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0423307
50.1%
1421085
49.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII844392
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0423307
50.1%
1421085
49.9%

Promo2SinceWeek
Real number (ℝ≥0)

HIGH CORRELATION

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.12637969
Minimum1
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2022-11-28T17:14:06.815592image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q122
median23
Q323
95-th percentile40
Maximum50
Range49
Interquartile range (IQR)1

Descriptive statistics

Standard deviation9.958280574
Coefficient of variation (CV)0.4306026584
Kurtosis0.2780491106
Mean23.12637969
Median Absolute Deviation (MAD)0
Skewness0.188918715
Sum19527730
Variance99.16735199
MonotonicityNot monotonic
2022-11-28T17:14:06.851553image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
23426865
50.6%
1460541
 
7.2%
4051507
 
6.1%
3133238
 
3.9%
1032214
 
3.8%
529722
 
3.5%
3727116
 
3.2%
126873
 
3.2%
1324579
 
2.9%
4524072
 
2.9%
Other values (14)107665
 
12.8%
ValueCountFrequency (%)
126873
 
3.2%
529722
 
3.5%
6771
 
0.1%
910293
 
1.2%
1032214
 
3.8%
1324579
 
2.9%
1460541
 
7.2%
1822456
 
2.7%
2223645
 
2.8%
23426865
50.6%
ValueCountFrequency (%)
50780
 
0.1%
49622
 
0.1%
487033
 
0.8%
4524072
2.9%
442182
 
0.3%
4051507
6.1%
393889
 
0.5%
3727116
3.2%
367620
 
0.9%
3518888
 
2.2%

Promo2SinceYear
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2011.376017
Minimum2009
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2022-11-28T17:14:06.999047image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12011
median2011
Q32012
95-th percentile2014
Maximum2015
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.232031398
Coefficient of variation (CV)0.0006125316137
Kurtosis0.5693643536
Mean2011.376017
Median Absolute Deviation (MAD)0
Skewness0.6844175859
Sum1698389818
Variance1.517901365
MonotonicityNot monotonic
2022-11-28T17:14:07.031469image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2011518347
61.4%
201391866
 
10.9%
201465768
 
7.8%
201260716
 
7.2%
200953826
 
6.4%
201046414
 
5.5%
20157455
 
0.9%
ValueCountFrequency (%)
200953826
 
6.4%
201046414
 
5.5%
2011518347
61.4%
201260716
 
7.2%
201391866
 
10.9%
201465768
 
7.8%
20157455
 
0.9%
ValueCountFrequency (%)
20157455
 
0.9%
201465768
 
7.8%
201391866
 
10.9%
201260716
 
7.2%
2011518347
61.4%
201046414
 
5.5%
200953826
 
6.4%

PromoInterval
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
0
423307 
1
242411 
2
98005 
3
80669 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters844392
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0423307
50.1%
1242411
28.7%
298005
 
11.6%
380669
 
9.6%

Length

2022-11-28T17:14:07.065367image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-28T17:14:07.103187image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0423307
50.1%
1242411
28.7%
298005
 
11.6%
380669
 
9.6%

Most occurring characters

ValueCountFrequency (%)
0423307
50.1%
1242411
28.7%
298005
 
11.6%
380669
 
9.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number844392
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0423307
50.1%
1242411
28.7%
298005
 
11.6%
380669
 
9.6%

Most occurring scripts

ValueCountFrequency (%)
Common844392
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0423307
50.1%
1242411
28.7%
298005
 
11.6%
380669
 
9.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII844392
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0423307
50.1%
1242411
28.7%
298005
 
11.6%
380669
 
9.6%

CompetitionOpenSinceMonth
Real number (ℝ)

Distinct376
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean63.89704782
Minimum-32
Maximum1407
Zeros4524
Zeros (%)0.5%
Negative70108
Negative (%)8.3%
Memory size6.4 MiB
2022-11-28T17:14:07.140218image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-32
5-th percentile-7
Q130
median64
Q383
95-th percentile145
Maximum1407
Range1439
Interquartile range (IQR)53

Descriptive statistics

Standard deviation60.80793108
Coefficient of variation (CV)0.9516547815
Kurtosis172.6376179
Mean63.89704782
Median Absolute Deviation (MAD)23
Skewness8.734165744
Sum53954156
Variance3697.604482
MonotonicityNot monotonic
2022-11-28T17:14:07.183721image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5612695
 
1.5%
6012693
 
1.5%
6312690
 
1.5%
6112625
 
1.5%
6812561
 
1.5%
6512434
 
1.5%
6912332
 
1.5%
8112326
 
1.5%
8012296
 
1.5%
6212259
 
1.5%
Other values (366)719481
85.2%
ValueCountFrequency (%)
-3230
 
< 0.1%
-31147
 
< 0.1%
-30323
 
< 0.1%
-29445
 
0.1%
-28593
0.1%
-27772
0.1%
-26853
0.1%
-25896
0.1%
-24976
0.1%
-231139
0.1%
ValueCountFrequency (%)
14075
 
< 0.1%
140625
< 0.1%
140525
< 0.1%
140423
< 0.1%
140323
< 0.1%
140226
< 0.1%
140126
< 0.1%
140021
< 0.1%
139323
< 0.1%
139224
< 0.1%

Year
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
2013
337943 
2014
310417 
2015
196032 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters3377568
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2015
2nd row2015
3rd row2015
4th row2015
5th row2015

Common Values

ValueCountFrequency (%)
2013337943
40.0%
2014310417
36.8%
2015196032
23.2%

Length

2022-11-28T17:14:07.222021image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-28T17:14:07.258544image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
2013337943
40.0%
2014310417
36.8%
2015196032
23.2%

Most occurring characters

ValueCountFrequency (%)
2844392
25.0%
0844392
25.0%
1844392
25.0%
3337943
10.0%
4310417
 
9.2%
5196032
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3377568
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2844392
25.0%
0844392
25.0%
1844392
25.0%
3337943
10.0%
4310417
 
9.2%
5196032
 
5.8%

Most occurring scripts

ValueCountFrequency (%)
Common3377568
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2844392
25.0%
0844392
25.0%
1844392
25.0%
3337943
10.0%
4310417
 
9.2%
5196032
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII3377568
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2844392
25.0%
0844392
25.0%
1844392
25.0%
3337943
10.0%
4310417
 
9.2%
5196032
 
5.8%

Season
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
2
247814 
3
222576 
1
216979 
4
157023 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters844392
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row3
3rd row3
4th row3
5th row3

Common Values

ValueCountFrequency (%)
2247814
29.3%
3222576
26.4%
1216979
25.7%
4157023
18.6%

Length

2022-11-28T17:14:07.290586image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-28T17:14:07.328342image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
2247814
29.3%
3222576
26.4%
1216979
25.7%
4157023
18.6%

Most occurring characters

ValueCountFrequency (%)
2247814
29.3%
3222576
26.4%
1216979
25.7%
4157023
18.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number844392
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2247814
29.3%
3222576
26.4%
1216979
25.7%
4157023
18.6%

Most occurring scripts

ValueCountFrequency (%)
Common844392
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2247814
29.3%
3222576
26.4%
1216979
25.7%
4157023
18.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII844392
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2247814
29.3%
3222576
26.4%
1216979
25.7%
4157023
18.6%

Month
Real number (ℝ≥0)

HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.845737525
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2022-11-28T17:14:07.360438image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q38
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.32393127
Coefficient of variation (CV)0.568607683
Kurtosis-1.033170522
Mean5.845737525
Median Absolute Deviation (MAD)3
Skewness0.2577004047
Sum4936094
Variance11.04851908
MonotonicityNot monotonic
2022-11-28T17:14:07.390665image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
186343
10.2%
385980
10.2%
785587
10.1%
682576
9.8%
481731
9.7%
280243
9.5%
580103
9.5%
854413
6.4%
1053292
6.3%
952330
6.2%
Other values (2)101794
12.1%
ValueCountFrequency (%)
186343
10.2%
280243
9.5%
385980
10.2%
481731
9.7%
580103
9.5%
682576
9.8%
785587
10.1%
854413
6.4%
952330
6.2%
1053292
6.3%
ValueCountFrequency (%)
1250393
6.0%
1151401
6.1%
1053292
6.3%
952330
6.2%
854413
6.4%
785587
10.1%
682576
9.8%
580103
9.5%
481731
9.7%
385980
10.2%

WeekOfYear
Real number (ℝ≥0)

HIGH CORRELATION

Distinct52
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.64680149
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 MiB
2022-11-28T17:14:07.427696image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q111
median23
Q335
95-th percentile49
Maximum52
Range51
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.389785
Coefficient of variation (CV)0.6085298685
Kurtosis-1.025739123
Mean23.64680149
Median Absolute Deviation (MAD)12
Skewness0.2622827815
Sum19967170
Variance207.0659123
MonotonicityNot monotonic
2022-11-28T17:14:07.473323image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2620121
 
2.4%
1220099
 
2.4%
920093
 
2.4%
1120081
 
2.4%
620068
 
2.4%
520065
 
2.4%
820053
 
2.4%
1020051
 
2.4%
420047
 
2.4%
320043
 
2.4%
Other values (42)643671
76.2%
ValueCountFrequency (%)
115161
1.8%
219448
2.3%
320043
2.4%
420047
2.4%
520065
2.4%
620068
2.4%
720041
2.4%
820053
2.4%
920093
2.4%
1020051
2.4%
ValueCountFrequency (%)
528319
1.0%
5112355
1.5%
5012333
1.5%
4912334
1.5%
4812334
1.5%
4712182
1.4%
4612333
1.5%
4512334
1.5%
4411042
1.3%
4312361
1.5%

Day
Real number (ℝ≥0)

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.83568295
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2022-11-28T17:14:07.513128image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q18
median16
Q323
95-th percentile30
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.683456038
Coefficient of variation (CV)0.5483474293
Kurtosis-1.179690611
Mean15.83568295
Median Absolute Deviation (MAD)7
Skewness0.01111712421
Sum13371524
Variance75.40240876
MonotonicityNot monotonic
2022-11-28T17:14:07.551034image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
1130121
 
3.6%
429474
 
3.5%
2729270
 
3.5%
1329262
 
3.5%
2329241
 
3.5%
229235
 
3.5%
1629203
 
3.5%
1829060
 
3.4%
2828367
 
3.4%
728359
 
3.4%
Other values (21)552800
65.5%
ValueCountFrequency (%)
119368
2.3%
229235
3.5%
325058
3.0%
429474
3.5%
528176
3.3%
627566
3.3%
728359
3.4%
827961
3.3%
927068
3.2%
1028159
3.3%
ValueCountFrequency (%)
3115924
1.9%
3026326
3.1%
2923575
2.8%
2828367
3.4%
2729270
3.5%
2626168
3.1%
2527065
3.2%
2428163
3.3%
2329241
3.5%
2227988
3.3%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
0
843482 
1
 
910

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters844392
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0843482
99.9%
1910
 
0.1%

Length

2022-11-28T17:14:07.586355image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-28T17:14:07.621162image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0843482
99.9%
1910
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0843482
99.9%
1910
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number844392
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0843482
99.9%
1910
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common844392
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0843482
99.9%
1910
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII844392
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0843482
99.9%
1910
 
0.1%

Interactions

2022-11-28T17:14:03.088577image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:52.673631image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:53.788205image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:54.910569image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:56.184470image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:57.315146image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:58.364151image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:59.496163image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:00.579417image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:01.688554image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:03.200946image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:52.791555image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:53.900396image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:55.026906image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:56.299160image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:57.416163image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:58.466396image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:59.604251image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:00.685234image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:01.827148image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:03.316860image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:52.903776image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:54.011438image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:55.138705image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:56.417358image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:57.522224image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:58.643521image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:59.709713image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:00.799029image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:01.959718image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:03.429531image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:53.007283image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:54.118041image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:55.307907image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:56.525926image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:57.622640image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:58.746925image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:59.812221image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:00.898971image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:02.090296image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:03.546351image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:53.112100image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:54.224389image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:55.426893image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:56.634835image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:57.718473image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:58.851895image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:59.912924image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:01.004259image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:02.219847image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:03.662219image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:53.220281image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:54.334226image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:55.542657image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:56.744840image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:57.818831image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:58.950863image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:00.019984image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:01.112181image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:02.342039image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:03.775256image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:53.325345image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:54.442912image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:55.658276image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:56.852195image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:57.921644image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:59.056204image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:00.126646image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:01.220665image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:02.470412image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:03.886803image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:53.432518image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:54.554510image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:55.780718image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:56.962101image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:58.028216image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:59.162464image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:00.235867image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:01.321961image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:02.690728image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:04.025449image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:53.560668image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:54.686951image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:55.930311image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:57.093972image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:58.157599image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:59.288321image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:00.364497image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:01.444199image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:02.845919image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:04.139271image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:53.670146image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:54.795406image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:56.067795image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:57.202278image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:58.264814image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:13:59.393094image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:00.473355image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:01.559538image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-28T17:14:02.971844image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-11-28T17:14:07.656601image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-28T17:14:07.760224image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-28T17:14:07.861976image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-28T17:14:07.963073image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-28T17:14:08.060015image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-28T17:14:08.151311image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-28T17:14:04.227594image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-28T17:14:04.891091image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

StoreDayOfWeekSalesPromoSchoolHolidayStoreTypeAssortmentCompetitionDistancePromo2Promo2SinceWeekPromo2SinceYearPromoIntervalCompetitionOpenSinceMonthYearSeasonMonthWeekOfYearDayIsOpenOnPublicHoliday
015526311311270.0023201108420153731310
114502011311270.0023201108420153731300
213478211311270.0023201108420153731290
312501111311270.0023201108420153731280
411610211311270.0023201108420153731270
516436400311270.0023201108320153730250
615370600311270.0023201108320153730240
714376900311270.0023201108320153730230
813346400311270.0023201108320153730220
912355800311270.0023201108320153730210

Last rows

StoreDayOfWeekSalesPromoSchoolHolidayStoreTypeAssortmentCompetitionDistancePromo2Promo2SinceWeekPromo2SinceYearPromoIntervalCompetitionOpenSinceMonthYearSeasonMonthWeekOfYearDayIsOpenOnPublicHoliday
84438211156449700435350.012220123552013112120
84438311155514211435350.012220123552013112110
84438411154500711435350.012220123552013112100
84438511153464911435350.01222012355201311290
84438611152524311435350.01222012355201311280
84438711151690511435350.01222012355201311270
84438811156477101435350.01222012354201311150
84438911155454001435350.01222012354201311140
84439011154429701435350.01222012354201311130
84439111153369701435350.01222012354201311120